Approximation Algorithms for Min-Sum k-Clustering and Balanced k-Median
نویسندگان
چکیده
We consider two closely related fundamental clustering problems in this paper. In the min-sum k-clustering one is given a metric space and has to partition the points into k clusterswhile minimizing the sum of pairwise distances between the points within the clusters. In theBalanced k-Median problem the instance is the same and one has to obtain a clustering into kcluster C1, . . . , Ck, where each cluster Ci has a center ci, while minimizing the total assignmentcosts for the points in the metric; here the cost of assigning a point j to a cluster Ci is equal to|Ci| times the j, cj distance in the metric.In this paper, we present an O(log n)-approximation for both these problems where n isthe number of points in the metric that are to be served. This is an improvement over theO( −1 log n)-approximation (for any constant > 0) obtained by Bartal, Charikar, and Raz[STOC ’01]. We also obtain a quasi-PTAS for Balanced k-Median in metrics with constantdoubling dimension.As in the work of Bartal et al., our approximation for general metrics uses embeddingsinto tree metrics. The main technical contribution in this paper is an O(1)-approximation forBalanced k-Median in hierarchically separated trees (HSTs). Our improvement comes from amore direct dynamic programming approach that heavily exploits properties of standard HSTs.In this way, we avoid the reduction to special types of HSTs that were considered by Bartal etal., thereby avoiding an additional O( −1 log n) loss. ∗Email: [email protected].†Email: [email protected].‡Supported by NSERC. Email: [email protected].§Email: [email protected].
منابع مشابه
Sublinear-Time Approximation for Clustering Via Random Sampling
In this paper we present a novel analysis of a random sampling approach for three clustering problems in metric spaces: k-median, min-sum k-clustering, and balanced k-median. For all these problems we consider the following simple sampling scheme: select a small sample set of points uniformly at random from V and then run some approximation algorithm on this sample set to compute an approximati...
متن کاملFinding Low Error Clusterings
A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairw...
متن کاملApproximate clustering without the approximation
Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approxi...
متن کاملA Survey on Exact and Approximation Algorithms for Clustering
Given a set of point P in Rd, a clustering problem is to partition P into k subsets {P1, P2, · · · , Pk} in such a way that a given objective function is minimized. The most studied cost functions for a cluster, μ(Pi), are maximum or average radius of Pi, maximum diameter of Pi, and maximum width of Pi. The overall objective function is ⊕ μ(Pi), where ⊕ is typically the Lp-norm operator. The mo...
متن کاملDistributed Balanced Clustering via Mapping Coresets
Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the “balanced clustering” problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound di...
متن کامل